Combining acoustic and visual features to detect laughter in adults' speech
نویسندگان
چکیده
Laughter can not only convey the affective state of the speaker but also be perceived differently based on the context in which it is used. In this paper, we focus on detecting laughter in adults’ speech using the MAHNOB laughter database. The paper explores the use of novel long-term acoustic features to capture the periodic nature of laughter and the use of computer vision-based smile features to analyze laughter. The classification accuracy of the leave-one-speaker-out cross-validation using a cost-sensitive learning approach with a random forest classifier with 100 trees for detecting laughter in adults’ speech was 93.06% using acoustic features alone. Using only the visual features, the accuracy was 89.48%. Early fusion of audio and visual features resulted in an absolute improvement in the accuracy, compared to using only acoustic features, by 3.79% to 96.85%. The results indicate that the novel acoustic features do capture the repetitive characteristics of laughter, and the vision-based smile features can provide complementary visual cues to discriminate between speech and laughter. The significant finding of the study is the improvement of not only the accuracy, but a reduction in the false positives using the fusion of audio-visual features.
منابع مشابه
Laughter detection using ALISP-based N-Gram models
Laughter is a very complex behavior that communicates a wide range of messages with different meanings. It is highly dependent on social and interpersonal attributes. Most of the previous works (e.g. [1, 2]) on automatic laughter detection from audio uses frame-level acoustic features as parameters to train their machine learning techniques, such as Gaussian Mixture Models (GMMs), Support Vecto...
متن کاملAcoustic and phonetic differences in laughter of male children and adults
This paper compares the acoustic differences in spontaneous recordings of child and adult laughter. Results indicate that, mean pitch and intensity of laughter are significantly different in adults and children but they follow expected speech patterns. Children also have higher vocal tract resonant frequencies when compared to adults. However, both groups were similar in differentiating the non...
متن کاملDetection of children's paralinguistic events in interaction with caregivers
Paralinguistic cues in children’s speech convey the child’s affective state and can serve as important markers for the early detection of autism spectrum disorder (ASD). In this paper, we detect paralinguistic events, such as laughter and fussing/crying, along with toddlers’ speech from the Multi-modal Dyadic Behavior Dataset (MMDB). We use both spectral and prosodic acoustic features selected ...
متن کاملCharacteristic contours of syllabic-level units in laughter
Trying to automatically detect laughter and other nonlinguistic events in speech raises a fundamental question: Is it appropriate to simply adopt acoustic features that have traditionally been used for analyzing linguistic events? Thus we take a step back and propose syllabic-level features that may show a contrast between laughter and speech in their intensity-, pitch-, and timbral-contours an...
متن کاملAudio-visual Laughter Synthesis System
In this paper we propose an overview of a project aiming at building an audio-visual laughter synthesis system. The same approach is followed for acoustic and visual synthesis. First a database has been built to have synchronous audio and 3D visual landmarks tracking data. Then this data has been used to build HMM models of acoustic laughter and visual laughter separately. Visual laughter model...
متن کامل